10 research outputs found

    Tiny Classifier Circuits: Evolving Accelerators for Tabular Data

    Full text link
    A typical machine learning (ML) development cycle for edge computing is to maximise the performance during model training and then minimise the memory/area footprint of the trained model for deployment on edge devices targeting CPUs, GPUs, microcontrollers, or custom hardware accelerators. This paper proposes a methodology for automatically generating predictor circuits for classification of tabular data with comparable prediction performance to conventional ML techniques while using substantially fewer hardware resources and power. The proposed methodology uses an evolutionary algorithm to search over the space of logic gates and automatically generates a classifier circuit with maximised training prediction accuracy. Classifier circuits are so tiny (i.e., consisting of no more than 300 logic gates) that they are called "Tiny Classifier" circuits, and can efficiently be implemented in ASIC or on an FPGA. We empirically evaluate the automatic Tiny Classifier circuit generation methodology or "Auto Tiny Classifiers" on a wide range of tabular datasets, and compare it against conventional ML techniques such as Amazon's AutoGluon, Google's TabNet and a neural search over Multi-Layer Perceptrons. Despite Tiny Classifiers being constrained to a few hundred logic gates, we observe no statistically significant difference in prediction performance in comparison to the best-performing ML baseline. When synthesised as a Silicon chip, Tiny Classifiers use 8-18x less area and 4-8x less power. When implemented as an ultra-low cost chip on a flexible substrate (i.e., FlexIC), they occupy 10-75x less area and consume 13-75x less power compared to the most hardware-efficient ML baseline. On an FPGA, Tiny Classifiers consume 3-11x fewer resources.Comment: 14 pages, 16 figure

    Hermes: Architecting a top-performing fault-tolerant routing algorithm for networks-on-chips

    No full text
    Networks-on-Chips (NoCs) are experiencing escalating susceptibility to wear-out and reduced reliability, with the risk of becoming the key point of failure in an entire multicore chip. In this paper we propose Hermes, a highly-robust, distributed fault-tolerant routing algorithm, whose performance degrades gracefully with increasing faulty NoC link counts. Hermes is a deadlock-free hybrid routing algorithm, utilizing load-balanced routing on fault-free paths, while providing pre-reconfigured escape routes in the vicinity of faults. An initial experimental evaluation shows that Hermes improves network throughput by up to 2.2× when compared against the existing state-of-the-art

    Hermes: Architecting a top-performing fault-tolerant routing algorithm for networks-on-chips

    No full text
    Networks-on-Chips (NoCs) are experiencing escalating susceptibility to wear-out and reduced reliability, with the risk of becoming the key point of failure in an entire multicore chip. In this paper we propose Hermes, a highly-robust, distributed fault-tolerant routing algorithm, whose performance degrades gracefully with increasing faulty NoC link counts. Hermes is a deadlock-free hybrid routing algorithm, utilizing load-balanced routing on fault-free paths, while providing pre-reconfigured escape routes in the vicinity of faults. An initial experimental evaluation shows that Hermes improves network throughput by up to 2.2× when compared against the existing state-of-the-art

    General vaccination knowledge influences nurses' and midwives' COVID-19 vaccination intention in Cyprus: a nationwide cross-sectional study

    No full text
    This cross-sectional study was conducted during the period between 08 and 28 December 2020 to investigate the association of nurses' and midwives' level of vaccination knowledge and the COVID-19 vaccine acceptance for themselves during the COVID-19 pandemic era in Cyprus. Participants included registered nurses and midwives working in public or private service provision. Data collection was achieved using a self-administered questionnaire with questions on socio-demographic characteristics, questions assessing participants' general vaccination knowledge, and questions related to COVID-19 vaccination. A total of 437 responders answered the survey, with 93% being nurses and 7% midwives. The results indicate that as the vaccination knowledge score increases (higher knowledge) the probability of accepting the COVID-19 vaccination increases too (OR = 1.30, 95% CI: 1.13-1.48). The association between vaccination knowledge and the intention to be vaccinated against COVID-19 remained statistically significant, even after adjusting for age and gender (OR = 1.28, 95% CI: 1.12-1.47), socioeconomic (OR = 1.29, 95% CI: 1.12-1.48), and demographic characteristics (OR = 1.29, 95% CI: 1.11-1.49). Also, as age increases, the probability of accepting the COVID-19 vaccination increases, while female respondents had a lower probability of accepting the COVID-19 vaccination than male respondents. This study demonstrated that COVID-19 vaccination acceptance is related to the vaccination knowledge of the nurses and midwives in Cyprus. Targeted vaccination campaigns are needed to improve nurses' and midwives' level of vaccination knowledge in order to achieve a better coverage among them, as well as to influence their patients' ultimate positive vaccine decision

    Υλοποίηση των αλγορίθμων mutual information και transfer entropy με υπερ-υπολογιστή βασισμένο σε αναδιατασσόμενη λογική

    No full text
    A thesis submitted in fulfillment of the requirements for the degree of Diploma in Electrical and Computer EngineeringSummarization: It is widely known that contemporary applications are bounded by massive computational demands. With conventional CPUs falling out of favor due to their limitations, the industry of Hybrid-SuperComputers using reconfigurable logic which is a growing field in the area of Computer Systems. This thesis explores the Convey Computer, and more specifically the platform HC-2ex which is a hybrid platform with increased computational capacity as well as a combination of a high-bandwidth memory interface with an architecture featuring multiple levels of computational parallelism. This platform selected in order to efficiently map computationally intensive algorithms in modern hardware. We address two challenging problems within this framework, the first being time-series analysis by focusing on the calculation of the Mutual Information (MI) statistical value and the second being the Transfer Entropy (TE) statistical value between two time-series. The problems of Mutual Information and Transfer Entropy respectively, have been addressed by the research community for low-precision arithmetic applications, but the performance of these algorithms have not been evaluated on platforms like Convey Computer. This is the first work to extensively study of using this platform, by identifying the pros and cons of Convey Computer with computationally intensive algorithms as well as describing how these algorithms can efficiently utilized. In terms of result, Mutual Information and Transfer Entropy implementations compared with implemented architectures on other platforms like Maxeler. Compared to the reference software, the implementation of MI algorithm yielded 13x speedup as well as the implementation of TE yielded 15x speedup for high dimensional data using 32-bit precision arithmetic on Convey HC-2ex

    A performance evaluation of multi-FPGA architectures for computations of information transfer

    No full text
    Summarization: Mutual Information (MI) and Transfer Entropy (TE) algorithms compute statistical measurements on the information shared between two dependent random processes. These measurements have focused on pairwise computations of time series in a broad range of fields, such as Econometrics, Neuroscience, Data Mining and Computer Vision. Unlike previous works which mostly focus on 8-bit Computer Vision applications, this work proposes the first generic hardware architectures for the acceleration of the MI and TE algorithms to target any dataset for a realistic, multi-FPGA platform. We evaluate and compare two such systems, the Maxeler MAX3A Vectis and the Convey HC-2ex platforms, and provide insight into each one's benefits and limitations. All reported results are from actual experimental runs, including I/O overhead, and comprise lower bounds of our systems' full capabilities for large-scale datasets. These are compared to equivalent optimized multi-threaded software implementations, yielding ∼19x speedup vs. out-of-the-box software packages and ∼2.5x speedup vs. highly optimized software that is presented in the related work. These hardware architectures are obtained with a small fraction of the FPGA resources, and are limited by I/O bandwidth. This means that with near-future FPGA I/O capabilities, the performance of the architectures presented in this work for the O(n 2 ) Mutual Information and the O(n 3 ) Transfer Entropy problems will easily scale up.Παρουσιάστηκε στο: 18th International Conference on Embedded Computer Systems: Architectures, MOdeling and Simulatio
    corecore